Note
Click here to download the full example code
Using Deepchecks Vision With a Few Lines of Code#
Deepchecks Vision is built to validate your data and model, however complex your model and data may be. That being said, sometime there is no need to write a full-blown ClassificationData or DetectionData. In the case of a simple classification task, there is quite a few checks that can be run writing only a few lines of code. In this tutorial, we will show you how to run all checks that do not require a model on a simple classification task.
This is ideal, for example, when receiving a new dataset for a classification task. Running these checks on the dataset before even starting with training will give you a quick idea of how the dataset looks like and what potential issues it contains.
Defining the data and model#
The data is available from the torch library. We will download and extract it to the current directory.
import urllib.request
import zipfile
import os
url = 'https://download.pytorch.org/tutorial/hymenoptera_data.zip'
urllib.request.urlretrieve(url, 'hymenoptera_data.zip')
with zipfile.ZipFile('hymenoptera_data.zip', 'r') as zip_ref:
zip_ref.extractall('.')
# Rename val folder to test, because the simple classification task expects a test folder.
if not os.path.exists('hymenoptera_data/test'):
os.rename('hymenoptera_data/val', 'hymenoptera_data/test')
Loading a Simple Classification Dataset#
A simple classification dataset is an image dataset structured in the following way:
- root/
- train/
- class1/
image1.jpeg
- test/
- class1/
image1.jpeg
from deepchecks.vision.simple_classification_data import load_dataset
train_ds = load_dataset('hymenoptera_data', train=True, object_type='VisionData', image_extension='jpg')
test_ds = load_dataset('hymenoptera_data', train=False, object_type='VisionData', image_extension='jpg')
# Running Deepchecks' full suite
# ==============================
# That's it, we have just defined the classification data object and are ready to run the train_test_validation suite:
from deepchecks.vision.suites import train_test_validation
suite = train_test_validation()
result = suite.run(train_ds, test_ds)
Out:
Validating Input: 0%| | 0/1 [00:00<?, ? /s]
Ingesting Batches - Train Dataset: 0%| | 0/8 [00:00<?, ? Batch/s]
Ingesting Batches - Train Dataset: 0%| | 0/8 [00:00<?, ? Batch/s, Batch=0%]
Ingesting Batches - Train Dataset: 12%|# | 1/8 [00:02<00:17, 2.51s/ Batch, Batch=0%]
Ingesting Batches - Train Dataset: 12%|# | 1/8 [00:02<00:17, 2.51s/ Batch, Batch=12%]
Ingesting Batches - Train Dataset: 25%|## | 2/8 [00:06<00:20, 3.47s/ Batch, Batch=12%]
Ingesting Batches - Train Dataset: 25%|## | 2/8 [00:06<00:20, 3.47s/ Batch, Batch=25%]
Ingesting Batches - Train Dataset: 38%|### | 3/8 [00:09<00:15, 3.18s/ Batch, Batch=25%]
Ingesting Batches - Train Dataset: 38%|### | 3/8 [00:09<00:15, 3.18s/ Batch, Batch=38%]
Ingesting Batches - Train Dataset: 50%|#### | 4/8 [00:12<00:11, 2.96s/ Batch, Batch=38%]
Ingesting Batches - Train Dataset: 50%|#### | 4/8 [00:12<00:11, 2.96s/ Batch, Batch=50%]
Ingesting Batches - Train Dataset: 62%|##### | 5/8 [00:14<00:08, 2.84s/ Batch, Batch=50%]
Ingesting Batches - Train Dataset: 62%|##### | 5/8 [00:14<00:08, 2.84s/ Batch, Batch=62%]
Ingesting Batches - Train Dataset: 75%|###### | 6/8 [00:17<00:05, 2.81s/ Batch, Batch=62%]
Ingesting Batches - Train Dataset: 75%|###### | 6/8 [00:17<00:05, 2.81s/ Batch, Batch=75%]
Ingesting Batches - Train Dataset: 88%|####### | 7/8 [00:20<00:02, 2.76s/ Batch, Batch=75%]
Ingesting Batches - Train Dataset: 88%|####### | 7/8 [00:20<00:02, 2.76s/ Batch, Batch=88%]
Ingesting Batches - Train Dataset: 100%|########| 8/8 [00:21<00:00, 2.39s/ Batch, Batch=88%]
Ingesting Batches - Test Dataset: 0%| | 0/5 [00:00<?, ? Batch/s]
Ingesting Batches - Test Dataset: 0%| | 0/5 [00:00<?, ? Batch/s, Batch=0%]
Ingesting Batches - Test Dataset: 20%|# | 1/5 [00:02<00:11, 2.90s/ Batch, Batch=0%]
Ingesting Batches - Test Dataset: 20%|# | 1/5 [00:03<00:11, 2.90s/ Batch, Batch=20%]
Ingesting Batches - Test Dataset: 40%|## | 2/5 [00:05<00:08, 2.82s/ Batch, Batch=20%]
Ingesting Batches - Test Dataset: 40%|## | 2/5 [00:05<00:08, 2.82s/ Batch, Batch=40%]
Ingesting Batches - Test Dataset: 60%|### | 3/5 [00:08<00:05, 2.82s/ Batch, Batch=40%]
Ingesting Batches - Test Dataset: 60%|### | 3/5 [00:08<00:05, 2.82s/ Batch, Batch=60%]
Ingesting Batches - Test Dataset: 80%|#### | 4/5 [00:14<00:04, 4.05s/ Batch, Batch=60%]
Ingesting Batches - Test Dataset: 80%|#### | 4/5 [00:14<00:04, 4.05s/ Batch, Batch=80%]
Ingesting Batches - Test Dataset: 100%|#####| 5/5 [00:16<00:00, 3.32s/ Batch, Batch=80%]
Computing Checks: 0%| | 0/6 [00:00<?, ? Check/s]
Computing Checks: 0%| | 0/6 [00:00<?, ? Check/s, Check=Heatmap Comparison]
Computing Checks: 17%|# | 1/6 [00:00<00:00, 18.86 Check/s, Check=Train Test Label Drift]
Computing Checks: 33%|## | 2/6 [00:00<00:00, 21.09 Check/s, Check=Train Test Prediction Drift]
Computing Checks: 50%|### | 3/6 [00:00<00:00, 31.54 Check/s, Check=Image Property Drift]
Computing Checks: 67%|#### | 4/6 [00:00<00:00, 9.83 Check/s, Check=Image Property Drift]
Computing Checks: 67%|#### | 4/6 [00:00<00:00, 9.83 Check/s, Check=Image Dataset Drift]
Computing Checks: 83%|##### | 5/6 [00:00<00:00, 8.66 Check/s, Check=Image Dataset Drift]
Computing Checks: 83%|##### | 5/6 [00:00<00:00, 8.66 Check/s, Check=Simple Feature Contribution]
Computing Checks: 100%|######| 6/6 [00:00<00:00, 5.46 Check/s, Check=Simple Feature Contribution]
Observing the results:#
The results can be saved as a html file with the following code:
result.save_as_html('output.html')
Or, if working inside a notebook, the output can be displayed directly by simply printing the result object:
result